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Abstract — Network Utility Maximization (NUM) provides tlie 
key conceptual framework to study resource allocation amongst 
a collection of users/entities across disciplines as diverse as 
economics, law and engineering. In network engineering, this 
framework has been particularly insightfull towards understand- 
ing how Internet protocols allocate bandwidth, and motivated 
diverse research on distributed mechanisms to maximize net- 
work utility while incorporating new relevant constraints, on 
energy/power, storage, stability, etc., for systems ranging from 
communication networks to the smart-grid. However when the 
available resources and/or users' utilities vary over time, a 
user's allocations will tend to vary, which in turn may have a 
detrimental impact on the users' utility or quality of experience. 

This paper introduces a generalized NUM framework which 
explicitly incorporates the detrimental impact of temporal vari- 
ability in a user's allocated rewards. It explicitly incorporates 
tradeoffs amongst the mean and variability in users' allocations. 
We propose an online algorithm to realize variance-sensitive 
NUM, which, under stationary ergodic assumptions, is shown 
to be asymptotically optimal, i.e., achieves a time-average equal 
to that of an offline algorithm with knowledge of the future vari- 
ability in the system. This substantially extends work on NUM 
to an intersting class of relevant problems where users/entities 
are sensitive to temporal variability in their service or allocated 
rewards. 



I. Introduction 

Network Utility Maximization (NUM) provides the key con- 
ceptual framework to study (fair) resource allocation among 
a collection of users/entities across disciplines as diverse 
as economics, law and engineering. In network engineering 
this framework has recently served as a particularly insight- 
full setting in which to study (reverse engineer) how the 
Internet's congestion control protocols allocate bandwidth, 
how to devise schedulers for wireless systems with time 
varying channel capacities, and motivated the development 
of distributed mechanisms to maximize network utility in 
diverse settings including communication networks and the 
smart grid, while incorporating new relevant constraints, on 
energy, storage, power control, stability, etc. However when 
the available resources and/or users' utilities vary over time, 
allocations amongst users will tend to vary, which in turn may 
have a detrimental impact on the users' utility or perceived 
service quality. 

Indeed temporal variability in utility, service, resources or 
associated prices are particularly problematic when humans 
are the eventual recipients of the allocations. Humans typically 
view temporal variability negatively, as sign of an unreli- 
able service, network or market instability, or as a service 
which when viewed through human's cognitive and behavioral 
responses can, and will, translate to a degraded Quality of 
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Experience (QoE). For example temporal variability in video 
quality has been shown to lead to hysteresis effects in humans 
quality judgments can can substantially degrade a user's QoE. 
This in turn can lead users to make decisions, e.g., change 
provider, act upon perceived market instabilities, etc., which 
can have serious implications on buisineses and engineered 
systems, or economic markets. 

This paper introduces a generaUzed NUM framework which 
explicitly incorporates the detrimental impact of temporal 
variability in a user's allocated rewards. We use the term 
rewards as a proxy representing the resulting utiUty of, or 
any other quantity associated with, allocations to users/entities 
in a system. Our goal is to explicitly tackle the task of 
incorporating tradeoffs amongst the mean and variability in 
users' rewards. Thus, for example, in a variance-sensitive 
NUM setting, it may make sense to reduce a user's mean 
reward so as to reduce its variability. As will be discussed in 
the sequel there are many ways in which temporal variations 
can be accounted for, and which, in fact, present distinct 
technical challenges. In this paper we shall take a simple 
elegant approach to the problem which serves to address 
systems where tradeoffs amongst the mean and variability over 
time need to be made rather than systems where the mean (or 
target) is known, or where the issue at hand is the cumulative 
variance at the end of a given (e.g., investment) period. 

To better describe the characteristics of the problem we 
introduce some preliminary notation. We shall consider a 
network shared by a set J\f of users (or other entities) where 
\Af \ = N denotes the number of users in the system Through- 
out the paper, we distinguish between random variables (and 
random functions) and their realizations by using upper case 
letters for the former and lower case for the latter We use 
bold letters to denote vectors, e.g., a = {a.i : i E Af). We let 
(a)j^.j, denote the finite length sequence (a(t) : 1 < t < T). 
For a function U onM., U denotes its derivative. 

Thus if ri{t) represents the reward allocated to user i at 
time t, then r(t) = {ri{t) : i G Af) is the vector of rewards to 
users Af at time t and (r)i.y represents the rewards allocated 
over time t = 1,. . . ,T slots to the same users. We assume 
that reward allocations are subject to time varying network 
constraints, 

ct(r(i)) < 1 for t = l,...,r, 

where ct : M.^ — ^ R corresponds to convex function, thus im- 
plicitly defining a convex set of feasible reward allocations. To 
formally capture the impact of the time- varying resources on 
users' QoE consider the following offline convex optimization 
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problem OPT(T): 
/ 



max 

("■)l:T 



\ 



Proxy for user i's QoE 



subject to ct(r(<)) < V i G {1, 
n{t) > rn,inV t e {1,.. 

where for each i £ J\f, 



Var^ ((rOi^r) 



1 



T 



nit) 



1 

T 



...,T}, 

,,T},V leAf, 



We refer to this as an offline optimization because time- 
varying time constraints {ct)i:T are assumed to be known, 
and allow functions (Up',U^) .^j^ making the optimization 
problem convex. Note that the first term in a user i's proxy 
QoE ■^Y^f^iU.P {ri{t)) captures the degree to which QoE 
increases in his/her allocated rewards at any time, whereas the 
second term typically increasing in Var^(.) would penalizes 
temporal variability in reward allocation. Hence, this general 
formulation allows us to tradeoff between mean and variability 
associated with the reward allocations by appropriately choos- 
ing the functions {Uf , U^)^^j^. 

A. Main result and contributions 

The main contribution of this paper is in devising an online 
algorithm, for Adaptive Variability-Aware Resource (AVR) 
allocation, which realizes variance-sensitive NUM. Under sta- 
tionary ergodic assumptions on the time-varying constraints, 
we show AVR is asymptotically optimal, i.e., achieves a 
performance equal to that of the offline optimization OPT(T) 
introduced earlier. This is a strong optimality result, which at 
first sight may be surprising due to the dependency of Var^ ( . ) 
in the objective of OPT(T) on reward allocations over time and 
the time varying nature of the constraints (ct)j. The key idea 
exploits the characteristics of the problem, by keeping online 
estimates for the relevant quantities associated with users' 
allocations, e.g., the mean, variance, and mean QoE, which 
over time are shown to converge, and which eventually enable 
the online policy to produce allocations corresponding to the 
optimal stationary policy. Proving this result is somewhat 
challenging as it requires, showing that the estimates based 
on allocations produced by our online policy, AVR, (which 
itself depends on the estimated quantities), will converge to 
the desired values. To our knowledge this is the first attempt 
to generalize the NUM framework in this direction. We will 
contrast our problem formulation and approach to some of the 
past work in the literature addressing variance minimization, 
risk-sensitive control and other MDP based frameworks the 
related work below. 

B. Related Work 

Network Utility Maximization (NUM) provides the key 
conceptual framework to study how to allocate rewards fairly 



amongst a collection of users/entities. [?] provides an overview 
of NUM. But all the work on NUM including several major 
extensions (for e.g., [?], [?], [?] etc.) have ignored the impact 
of variability in reward allocation on the quality of experience 
of users. 

Adding a variance term in the objective function, would take 
things out of the general dynamic programming setting, see 
e.g. [?]. Indeed, including variance in the utility /cost to users 
at each time, signifies the overall cost is not decomposable, 
i.e., can not be written as a sum of costs each dependent 
only on the allocation at that time this makes sensitivity to 
variability challenging. For instance, [?] discusses minimum 
variance controller for linear systems (Section 5.3) where the 
objective is the minimization of the sum of second moments 
of the output variable. Sum of second moments is considered 
instead of the variance, which allows the cumulative cost to be 
represented sum of the costs incurred over time. Note however, 
that minimization of second moments does not directly address 
variability unless the mean is zero. The variance of the 
cumulative cost is incorporated in the objective for problems 
in risk sensitive optimal control (see [?]) to capture the risk 
associated with a policy. Note however, that the variance is 
of the cumulative cost rather than of the variability as seen 
by a user over time. To summarize, to our knowledge there 
are no previously proposed works on NUM that addresses the 
negative impact of variability. The algorithm proposed here 
falls into the class of stochastic fixed point algorithms (see 
[?]). Our algorithm is also related to the algorithms proposed 
in [?] and [?] although these works also ignore variability. 

C. Organization of the paper 

In Section II, we discuss the system model and assumptions. 
We study the optimality conditions for OPT(T) in Section 
III. We introduce OPTSTAT in IV and study its optimality 
conditions. We start Section V by formally introducing our 
online algorithm AVR. Then do a convergence analysis of AVR 
in Subsection V-A, and conclude the section by establishing 
the asymptotic optimality of AVR in Subsection V-B. We 
conclude the paper in Section VI. The proofs of some of 
the intermediate results used in the paper are discussed in an 
appendix given at the end of the paper. 

II. System model 

We consider a slotted system where slots are indexed by 
t G {0, 1, 2...}, and the system serves a fixed set of users M 
and let N =\N\. 

Let M+ = {6 e M : 6 > 0}. A sequence (b(t))j in a 
Euclidean space is said to converge to a set A if 

lim inf |jb(t) - all = 0, 

where ||.|| denotes the Euclidean norm associated with the 
space. For a function J7 on M, [/ denotes its derivative. We 
use I as the indicator function, i.e., for any set A, we let 
I{a6>t} = 1 if a e -4, and zero otherwise. 

We assume that the reward allocation r(t) e M.^ in slot t 
is constrained to satisfy the following inequality 



ct ir{t)) < 0, 
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where ct is picked from a (arbitrarily large) finite set C of real parametrized by a G (0, oo) ([?]) 



valued maps on M.^. We make the following assumptions on 
these constraints: 



Assumptions C1-C4 (Time varying constraints) 

C.l There is a constant r^^in > such that for any c E C, 
c (r) < for r such that = Vmin for each i e JV. 
C.l There is a constant 7',nax > such that for any c G C 
and r e M.^ satisfying c (r) < 0, we have r; < rmax for each 

C.3 Each function c G C is convex and differentiable on an 
open set containing [rmin, rmax]- 

C.4 For any c € C and r such that = rmin for each i E Af, 
c (r) < or c (r) < if c is an affine function. 
C.5 Let (Ct)j be a stationary ergodic process, and let 
{tt{c) : c € C) denote the stationary distribution associated. 

We let denote a random constraint with distribution 

(7r(c) : c G C). 

We could allow the constants rmin and r,„ax to be user 
dependent. But, we avoid that for notational simplicity. The 
condition C.4 is imposed to ensure that the constraint set is 
'nice' when used as a feasible set for an optimization problem 
OPT(T) (see for e.g. Lemma 1). 

Next we discuss the assumptions on the functions 
([//*, J7j^)^g^. For each i £ J\f, we make the assumptions 
U.V and U.R discussed next. 

Assumptions U.V and U.R 

Let l^niax — (^max ^min) ■ 

U.V: is defined and twice continuously differentiable on an 



log a; 

(1 



.1-a 



if a = 1, 
otherwise. 



(2) 



These functions are commonly used to enforce fairness to 
obtain allocations that are a— fair (see [?]). A larger a corre- 
sponds to a more fair allocation. Note that we have to ensure 



that 



' mill 1 ' max 



to ensure that function is well defined. 



and even if this is not the case, we could use Ua{- + S) instead 
of C/q(.) for an arbitrarily small positive shift 6 in the argument 
to avoid this requirement. 

We will see later that AYR can be made more efficient if 
is linear for some users i e Af. We define the following 
subsets of Af: 



Nvi = [i^M 
Afvn = {i e TV 



ur 



is linear} , 
is not linear} 



open set containing [0,v 



max J 

with min„g[o,„^_^^] {U^ 



c^min.t > and min^,£[o,„,„_i [U, 



(v) < 0. Further, we 
assume that for any two elements and in any Euclidean 
space M'' with x^ 7^ x^, and a £ (0, 1) with a = 1 — a, we 
have 



aU, 



V 



(1) 



denotes the Euclidean norm associated with the 



where 
space. 

For each i € N, let maXi,e[o,t,„,,^^] {UY) {v) = rf,nax,i- 
U.R: U^' is defined and differentiable on an open set contain- 
ing [rinin, r„iax]- Further, we assume that is concave and 
strictly increasing on [r,, 



' mill ; ' maxj 



Note that by picking, for each i e M, the functions UY from 
the following set 

Uv = {{v + (5)" : a G [0.5, 1] with 5 > if a 7^ 1} , 

we satisfy the requirements in U.V. Note that this includes the 
identity function {v) = v. Also, the function [v) = 
\/v 6 for any (arbitrarily small) 5 > Q satisfies the conditions 
in U.V, 

We satisfy U.R if we pick the functions {UP)^^j^ 
from following class of strictly concave increasing functions 



We focus on obtaining an algorithm for reward allocation 
that can be implemented at a centralized coordinator that has 
access to ct at the beginning of slot t. For instance, in a cellular 
network setting (like in WN), this could be a basestation that 
estimates the channel strengths of the users in the network to 
find Cf. 

A. Applications and scope of the model 

The presence of time varying constraints ct (r) < al- 
lows us to apply the model to several interesting and useful 
settings. In particular, here we focus on a wireless network 
setting by discussing three cases WN, WN-E and WN-T, 
and show that the model can handle problems involving 
time varying exogenous constraints and time varying utility 
functions. We start by discussing case WN where the reward 
in a slot is the rate allocated to the user in that slot. Let V 
denote a finite (but arbitrarily large) set of positive vectors 
where each vector corresponds to the peak transmission rate 
vector for a slot seen by users in a wireless network. Let 
C = |cp : Cp (r) = Y^i^M ^ " 1' P ^ ■^}- Here, for any 
allocation r, ri/pi is the fraction of time the wireless system 
needs to serve user i in slot t to deliver data at the rate of to 
user i in a slot where the user has peak data transmission rate 
Pi. Thus, the constraint Cp (r) < can be seen as a scheduling 
constraint that corresponds to the requirement that the sum of 
the fractions of time that different users are served in a slot 
should be less than or equal to one. 

Time varying exogenous constraints: We can also allow 
for time varying exogenous constraints on the wireless system 
by appropriately defining the set C. For instance, consider case 
WN-E where a base station in a cellular network allocates rates 
to users some of whom are streaming videos. As pointed above 
QoE of users viewing video content is sensitive to temporal 
variability in quality. But, while allocating rates to these 
users, we also need to account for the time varying resources 
requirements of the voice and data traffic handled by the 
basestation. We can deal with this constraint by defining C = 

{'^(P,/) ■■ c(P,/) W = E.sA^ ^ - (1 - /) , P e T', / e ^}, 

where is a finite set of real numbers in [0, 1] where each 
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element in the set corresponds to the fraction of time in a 
slot that is utilized the voice and data traffic. 

Time varying utility functions; For the users streaming 
video content discussed in the case WN-E, it is more appropri- 
ate to view the perceived video quality of a user in a slot as the 
reward for that user in that slot. However, for users streaming 
video content, the dependence of perceived video quality (in a 
short duration slot roughly a second long which corresponds to 
a collection of 20-30 frames) on the compression rate is time 
varying. This is typically due to the possibly changing nature 
of the content, e.g., from an action to a slower scene. Hence, 
the 'utiUty' function that maps the reward (i.e., perceived video 
quaUty) derived from the allocated resource (i.e., the rate) 
is time varying. This is the setting in the case WN-T, and 
we can handle it as follows. Let qt i (wi) denote the strictly 
increasing concave function that, in slot t, maps the perceived 
video quality to the rate Wi allocated to user i. For each user 
i, let Qi be a finite set of such functions. Hence, we can view 
WN-T as a case that has the following set of constraints; 



C = \ C(p,q) : C(p,q) (r) = ^ - 



in) 



-1, 



Note that each element in C3 is a convex function. 

For WN and WN-E, we can verify that by choosing 
''max = maxpe-p maxigA^Pi and an r,-nin satisfying < 
'"mill < minpeT' mirLjg^Pi, we satisfy C.1-C.4. In WN- 
T, if we assume that each function q G Q is differentiable and 
convex with q{0) = (which are very reasonable assump- 
tions on the dependence between quality and compression 
rate), then we can verify that by choosing rmin = and 
''max = maxpg-p maxjgjv/ maxggQ g (pi), we satisfy C.1-C.4. 

Variability aware rate adaptation for video; The above 
formulation is applicable to the problem of finding optimal 
(joint) video rate adaptation that maximizes the sum QoE of 
users streaming videos utilizing resources of a shared network. 
Given the predictions for explosive growth of video traffic in 
the near future (see [?]), this is among one of the important 
networking problems today. For a user viewing a video stream, 
variations in video quality over time has a detrimental impact 
on the user's QoE, see e.g., [?], [?], [?]. Indeed [?] even points 
out that variations in quality can result in a QoE that is worse 
than that of a constant quality video with lower average quality. 
Furthermore, [?] proposed and evaluated a metric for QoE 
which roughly corresponds to the choices Uf'{r) = r and 
W) = \ v + 5 in the model described above for a very 
small (5 > 0. 



III. Optimal Variance-Sensitive Offline Policy 

In this section, we study OPT(T), the offline formulation for 
optimal joint reward allocation introduced in Section I. In the 
offline setting, we assume that {c)-^.j., i.e., the realization of the 
process (C)j.j., is known. We denote the objective function 



of OPT(T) by i.e., 

<t>T{{r)^,T) 



leM \ t=i / 

and {UY^)ifzjs/ ™^ i^Y)i^_\f functions satisfying 

U.R and U.V respectively, and Var"'" ((ri)-^.^,) = 

^ ^^-^ ^ri(t) — ^ ^^^j^ rj(T)^ . Hence the optimization 
problem OPT(T) can be rewritten as; 



max ((r)i.T) 

('■)i:r 



(3) 



subjectto ct(r(i)) < V t e {!,..., T}, (4) 

n{t) > r„,inV t e {1, T} , V z e AA, (5) 

where ct £ C is a convex function for each t. The next 
result asserts that OPT(T) is a convex optimization problem 
satisfying Slater's condition (Section 5.2.3, [?]) and that it has 
a unique solution. 

Lemma 1. OPT{T) is a convex optimization problem satisfy- 
ing Slater's condition with a unique solution. 

Proof: Since we made the assumptions U.R and U.V, the 
convexity of the objective of OPT(r) is easy to establish once 
we prove the convexity of the function f7j^(Var^ (.)) for each 
i G M. Using (1) and the definition of Var^ (.), we can show 
that ?7^(Var^(.)) is a convex function for each i S N. The 
details are given next. For two different quality vectors (r^) ^ 
and (r^)j^^, any i £ Af, a E (0, 1) and a = 1 — a, we have 
that 



Var^ {a {rl) + a {r^) ^,^) 



= Wai^ {{arl + ar^) ^,^) 



1 ^ 
X E^ 

T = l 



Using (1), we have that 
UY fVar^ (a (rj) 



'1:T 



T ^ 

T = l 



)lt)) 



^ -uY(^j:(rm-^j:ri 




-am 



t=l \ T=l 

.UY (Var^ {{rY)^,^))+aUY (Var^ {{rf),.,^)) ■ 
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Thus, UY (Var^(.)) is a convex function. Using the above 
arguments and concavity of Uf- and —U^ (War'^ {.)), we 
conclude that OPT(T) is a convex optimization problem. 

Note that, from (1) (since we have a strict inequality), the 
inequality above is a strict one unless 



1 ^ 

T = l 



KKT-OPT(T): 

(r^)j^^ is an optimal solution to OPT(T) if and only if it is 
feasible, and there exist non-negative constants (/i^)^ and 
{■jf : i e A^)i.-r such that for all i e TV and t e {1, ...,T}, 
we have 



Thus, for the inequality not to be a strict one, we require that 
Var"^ (^(^rl^^ rp) ~ Var"^ (^(^rf)^ ^^)- Further, Slater's condition 
is satisfied and it mainly follows from the assumption C.4. 

Now, for any i G TV, Uf^ and -[/j^(Var^ (.)) are not 
necessarily strictly concave. But, we can still show that 
the objective is strictly concave as follows. Let (r-*- 
and (r^] 

from the concavity of the objective, (a (r 



T 



T 



{rj{t) - mj) 

-^c;,(r-(t)) + 2m.o, (6) 

/i^(i)ct(r^(t)) = 0, (7) 
lTit){rI{t)-rnun)=0, (8) 



'1:T 



be two optimal solutions to OPT(T). Then, 



+ a ir- 



is also an optimal solution for any a € (0,1) and 
a = 1 — a. Due to concavity of [//^(.) and convexity of 
[/j^ (Var"^ (.)), this is only possible if for each i e TV 

and 1 < i < T, L//^ (ar^t) + arf (t)) = aUf'{r}{t)) 



Here ^ denotes and we have used the fact that for any 
i eTV and r' e {!,.'.. ,T} 



^ (TVar^((r.),^)) = 2 (^r,(r') - ^^r^lr)^ 



aURi{r1{t)), and C/j^ (Var (a (r^^) ^.^ + a (rf ) j.^)) — From (6), we see that the optimal reward allocation r'^(t) in 



aC/^ (Var^ {{r^^.^)) + aUY (Var^ ((r,^),^^)) 

From above discussion, (Var"^ (a {r])^^ + a (r|) ^ j,)) 
is equal to aU^ (Var^ ((^Oj^t)) + (^ar^ ((^')i:t)) 
for each i e TV only if Var^ {{^Di-t) = '^a'"^ ((^j^)i r) 
for each i e AA, and ri(t) = r|(i) + ^ Er=i ^H^) " 
i X;t=i '^K'^) foi" each i e TV and 1 < i < T. Since for each 
i e k, Var^ (H)i.t) = ((^i^)i t)' '^"'^ ^° optimality 
of (r^)^ y and (r-^j^ we have that 



any time slot t depends on the entire allocation (r 
through the following four quantities associated with (r 



only 



' 1:T 



evaluated at the 



E 



\ t=l 



(rf(t))-[/r(Var^((r?),^^)) 



t=i 



=eU^e^/ 

+ ^T.rHr) -^t^rUr)] - UY (Var^ ((r?),,))) 

r = l r=l / / 

Since is a strictly increasing function for each i e TV, the 
above equation implies that 



T = l 



T = l 



and thus, 



r^i(t):==rf(OVl<t<T, VieTV. 



From the above discussion, we can conclude that OPT(T) has 
a unique solution. ■ 
We let (r^)^.j, denote the optimal solution to OPT(T). 
Since OPT(T) is a convex optimization problem satisfying 
Slater's condition (Lemma 1), Karush-Kuhn-Tucker (KKT) 
conditions ([?])given next are necessary and sufficient for 
optimaUty. Let mJ ~ ^ YYt=i ''^Ti^)- 



(i) time average reward m-^, (ii) y{UY) 

variance seen by the respective users. So, if a genie revealed 
these quantities, the optimal allocation for each slot t, can 
be determined by solving an optimization that only requires 
the knowledge of ct (associated with current slot) and not 
(c)j.j,. We exploit this key idea while formulating the online 
algorithm AVR (proposed in Section V). 

IV. A RELATED PROBLEM: OPTS TAT 

In this section, we introduce and study another optimization 
problem OPTSTAT closely related to OPT(r). The formula- 
tion OPT(T) mainly involves time averages of various quanti- 
ties associated with it. Instead, the formulation of OPTSTAT 
is based on the expected value of the corresponding quantities 
evaluated using the stationary distribution of (Ct)^. 

Recall that (see C.5) {Ct)^ is a stationary ergodic process 
with stationary distribution {tt{c) : c G C), i.e., for c e C, 7r(c) 
is the probability of the event ct — c. Since C is finite, we 
assume that tt{c) > for each c G C without any loss of 
generality. Let (r {c))^^(- be a vector representing the reward 
allocation r (c)(S R^) to the users for each c G C. Although 
we are abusing the notation introduced earlier where r(i) 
denoted the the allocation to the users in slot t, one can 
differentiate between the functions based on the context in 
which they are being discussed. Now, let 

'^.((r(c)Uc) = E (E^W^/'(^^(^)) 

-CT^ (Var- ((r. (c)),,^))) , 

Var- ((r, (c)),,^) = E ^(^) f^'(^) E ^i^Mci)] ■ 

ceC \ CiEC ) 
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The optimization problem OPTSTAT given below: 

max 0^ ((r(c))^gc) ' 

('■(c)),ec 

subject to c (r(c)) < 1, V c e C, 
ri{c) > r„iin, V c e C. 

The next result gives few useful properties of OPTSTAT. 

Lemma 2. (a) OPTSTAT is a convex optimization problem 

satisfying Slater's condition. 

(b) OPTSTAT has a unique solution. 

Proof: The proof is similar to that of Lemma 1 (and is 
easy to establish once we prove the convexity of the function 
Var''(.)). ■ 
Using Lemma 2 (a), we can conclude that KKT conditions 
are necessary and sufficient for optimality for OPTSTAT. Let 
(r'^ (c) : c S C) denote the optimal solution. 

KKT-OPTSTAT: 

There exist constants {fjT' (c) : c G C) and 
((7," (c)),e^ : c e C) are such that 

'r(c)((C/f)'(rr(c)) 

-2 [UY) (Var- ((rf (c)),,^)) [rl {c) -^.n (c) rj (c)^ ^ 

-MMc)c;(r-(c))+7r(c)=0, (9) 
p^ic)c{r^{c))=0, (10) 
7r (c) (rr (c) - w) = 0, (11) 



where Cj denotes and we used following result: for any 

Co eCieAf, 



aVar" ((r, (c)) 



cec) 



dri (co) 



27r(co) I n(co) - ^ 7r(ci)rj(ci) j 



V. Adaptive Variance aware 
Reward allocation 

In this section, we present our online algorithm AVR to 
solve OPT(r), and establish its asymptotic optimality. 

The reward allocations for AVR are obtained by solving 
OPTAVR(m, V, c) given below: 

max {U^ in) - {UY) {vi) [n - m,)') + /lo (v) 

subject to c(r) < 0, (12) 
n > Tmin V z e A/", (13) 

where 

Note that OPTAVR(m, v, c) is closely related to OPT- 
ONLINE (discussed in Subsection I-A). Also, note that 
ho (e, v) does not depend on the allocation and thus can 
be ignored while solving the optimization problem. But, it 



modifies the objective function and (thus) the optimal value of 
the objective function to ensure certain nice properties for the 
partial derivatives of latter (see Lemma 3 (b)). Let r* (m, v, c) 
denote the optimal solution to OPTAVR(m, v, c). Also, let H 
be given by: 



n = 



1 ; 



where x denotes cross product operator for sets. 

Next, we describe the algorithm AVR in detail. AVR con- 
sists of three steps, AVR.0-AVR.2, given next: 

Adaptive Variance aware Reward allocation (AVR) 

AVR.O: Initialize: Let (m(0),v(0)) e H. 

In each slot t + 1 for < > 0, carry out the following steps: 
AVR.l: The reward allocation in slot t is given by 
r* (m(t), e(t), v(i), ct+i) and will be denoted by r*{t + 1) 
(when the dependence on the variables is clear from context). 
AVR.2: In slot t, update fhi as follows: for all i e Af, 

m,{t + l) = m,{t) + ^ir*{t+l)-7n,{t)), (14) 

and update Vi as follows: for all i G Afvi, Vi{t + 1) = Wi(0), 
and for all i e Nvn, 

v,{t + 1) = v,{t) + i ((r*(t + 1) - fh,{t) f - d,it)^ . (15) 



We see that the update equations (14)-(15) roughly ensure that 
the parameters m{t) and (^^j(0)ieA^v keep track of mean 
reward and variance in reward respectively associated with 
the reward allocation under AVR. Also, note that we do not 
have to keep track of the estimates of variance in reward seen 
by users i with linear ■ 

We let 9t = (m(i),v(i)) for each t. The update equations 
(14)-(15) ensure that 6t stays in the set H. 

For any (m,v,c) S "H, we have (U^) (vi) > (see 
assumption U.V). Hence, OPTAVR(m, v, c) is a convex op- 
timization problem with a unique solution. Further, using 
assumption C.4, we can show that it satisfies Slater's condition. 
Hence, the optimal solution for OPTAVR(m, v, c) satisfies 
KKT conditions given below. 

KKT-OPTAVR(m, v, c): 

There exist non-negative constants n* and (7* : i £ M) such 
that for alH e TV 

(U.^)' ir:)-2{urUv,)ir*-m,) 

+7*-M*c:(r*) = 0, (16) 
M*c(r*) = 0, (17) 
7r«-^-min) = 0. (18) 



Let h (m, v, c) denote the optimal value of the objective 
function of OPTAVR(m, v, c), i.e., /i is a function defined on 
an open interval (the obvious one that can be obtained from 
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the domains of the functions (Uf , U^)^^j^) containing H as 
given below 



where r* stands for r* (m,v,c). 

In the next resuh, we estabUsh continuity and differentia- 
bihty properties of r* (m, v, c) (also denoted by r* in the 
result) and h (m, v, c) respectively, viewing them as functions 
of (m, v). 

Lemma 3. For any c £ C, and 9 = (m, v) E H 

(a) r* {6,c) is a continuous function of 6. 

(b) For each i S Af, 

dh {d, c) 



drui 
dh (61, c 



{c) E\r* (0,C"^)] is a continuous function of B. 
(d) For each i G M, 

OVi 

{v,~E[{r*{e,C^)-m,f]) . 



Proof Sketch: Proofs of parts (a) and (b) mainly rely on some 
fundamental results on perturbation analysis of optimization 
problems from [?] and [?]. Part (a) can be proved using 
Theorem 2.2 in [?]. The result in part (b) can be shown 
using Theorem 4.1 in [?]. This theorem tells us that if certain 
conditions are met, then we can evaluate the partial derivative 
of the optimal value of a parametric optimization problem 
(with respect to any parameter) by just evaluating the partial 
derivative of the objective of the optimization problem, and 
then substituting the optimal solution. For instance, by using 
the theorem, we can evaluate the partial derivative of the 
optimal value h {9, c) with respect to m,; as follows. We 
first evaluate the partial derivative of the objective function 
of OPTAVR {9, c): 



ho (v) + (c/f (r.) - {Ur)' M (r. - m, f) 



dm. 



Now, on substituting r* in the above expression, we obtain 
the first result in part (b). The other results can be obtained 
similarly. 

Parts (c) and (d) can shown using parts (a) and (b) respec- 
tively, and Bounded Convergence Theorem (see [?]). □ 

From part (b) of the above result, we see that the update 



equations (14)-(15) ensure that 9{t) moves in a direction 
that increases h{.). This is in part due to the careful choice 
of the function Hq (which is independent of variables being 
optimized) appearing in the objective function of OPTAVR. 

Next, we find relationships between the optimal solution 
(r'^ (c) : c e C) of OPTSTAT and OPTAVR. Towards that end, 
let 771- = E^gc ^ (c) rf (c) and = Var^ ((rf (c))^^^ 
each i G Af. Next, let 



n* = {(m,v) en : (m, v) satisfies (19) - (20)} , 
where the conditions (19)-(20) are given below: 

£;[r* (m,v,C")] V i G A/", (19) 

Var (r* (m, V, ) ) = Vie J\fvn ■ (20) 

Part (a) of the next result provides a fixed point like relation- 
ship for the optimal solution to OPTSTAT using the optimal 
solution function r*(.) of OPTAVR, and part (b) is a useful 
consequence of part (a). A proof for the result is given in 
Appendix A. 

Lemma 4. (m'^,v'^) satisfies 

(a) r* (m'^, v'^, c) = (c) for each c e C, and 

(b) (m'^,v-) e H*. 

The next result tells us that we can obtain the optimal 
solution to OPTSTAT from any element in H* by using the 
optimal solution function r*(.). Further, it gives us very useful 
uniqueness results for the components of the elements inH*. 
A proof for the result is given in Appendix B. 

Lemma 5. Suppose (mi,vi) e H*. Then, 

(a) (r* (mi, vi, c))^gg is an optimal solution to OPTSTAT. 
Suppose that (m2,V2) S Ji*. Then, 

(b) r* (mi, vi, c) = r* (m2, V2, c) for each c E C, and 

(c) mii = m2i for each i G Af, and vn = V2i for each 
i e Mvn- 

(d) niii = mf for each i G J\f, and vu = for each i G 

Till now, we focused only on the optimization problem 
OPTAVR associated with AVR. In the next subsection, we 



study the evolution of yStj under AVR. 

A. Convergence Analysis 

In this subsection, we focus on estabUshing some properties 
related to the convergence of the sequence y^tj that are key 
to proof of the main optimality result (Theorem 1). 

Towards that end, we study the the differential equation 

d9{t) 



dt 



(21) 



where g {9) is a function taking values in M.^^ defined as 
follows: for 9 = (m, v) G H, let 

(g(0)), = E[r*{9,Cn]-m„ 



ism 



N+i 



I 



E 



irU9,Cn-m,y 



The motivation for studying the above differential equation 
should be partly clear by comparing the RHS of (21) with the 
update equations in (14)-(15) in AVR. 
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Now we study (21) in light of the above result and obtain a 
convergence result for the differential equation, which tells us 
that for any initial condition, 6{t) evolving according to (21) 
converges to the set given by 

n. = {e = (m^, v) : r* {0, c) = r* {e'' , c) V c e C, 

We can verify that C "H* (using (19)-(20)). A proof for 
the next result is discussed in Appendix C. 

Lemma 6. Suppose 9{t) evolves according to (21). Then, 6(t) 
converges to "H* as t tends to infinity for any 0(0) € %. 

Now, due to the above result, we have a key convergence 
result for the differential equation (21) which is closely related 
to the update equations (14)-(15) of AVR. Next, we use this 
result to obtain a convergence result for {^tj ■ We do so 
by viewing (14)-(15) as a stochastic approximation update 
equation, and using a result from [?] that helps us to relate it 
the differntial equation (21). 



Lemma 7. If Oq £ H, then the sequence \ (^tj generated by 
AVR converges almost surely to the set H*. 

Proof Sketch: We can prove the result by viewing (14)-(15) as 
a stochastic approximation update equation. The proof mainly 
uses Lemma 6 and Theorem 1.1 of Chapter 6 from [?] (that 
gives sufficient conditions for convergence of a stochastic 
approximation scheme). □ 

We had pointed out that our main interest is in the con- 
vergence properties of [fhi{t), (U^) {vi{t))] . The next 
result uses Lemma 7 to establish the desired convergence 
property. A proof for the result is given in Appendix D. 



Lemma 8. If Oq £ H, then the sequence y^tj generated by 
AVR satisfies: 

(a) For each i G M, lim4_^oo m(t) ~ , and 

(b) limt^oo r* (9{t),cJ = r* {9", c), and 

(c) For each i e Mvn, linit^oo (C/^)' {v,{t)) - (U^)' {^1)- 

Next, we use Lemma 8 and stationarity to establish certain 
properties associated with the time averages of the reward 
allocations under the online scheme AVR. For brevity, in the 
following result, we let Y*{t) denote r* (m(t), v(t), ct). A 
proof for the result is given in Appendix E. 

Lemma 9. For almost all sample paths, 

T 



(a) For each i € A/", lim — r* (t) 



T = l 



lim rhi{t). 



(6) For each i G M, 



lim (Ur) {Var^ {{r*\.^)) ^ lun (t/f) (^,(t)) 

T— j-oo i— foo ^ ^ 



ference in performance of AVR and the optimal finite horizon 
policy becomes negligible. 

Theorem 1. For almost all sample paths the following two 
statements hold: 

(a) Feasibility: The allocation {r*)-y.rp associated with AVR 
satisfies (4) and (5), and for each i G N". 

(b) Optimality: AVR is asymptotically optimal, i.e.. 



lim (0T((r*)i^^)-(/.T((r^)i.^)) =0. 

1 — >CJO 



Proof: Since the allocation (r*)^^.^, associated with AVR 
satisfies (12) and (13) in each time slot, it also satisfies (4) 
and (5). Thus, part (a) is true. 

To prove part (b), consider any realization of (c)j^.y. Let 
{^*)^.rp and (7* : i G M)^.rp be the sequences of non negative 
real numbers satisfying (16), (17) and (18) for the realization. 
Hence, from the non-negativity of these numbers, and feasi- 
bility of (r-^) ^.y, we have 



(j)T ((l 



' l:Tl 



< 



where 



-C/r(Var-((.n.T))) 



t=i 

T 



t=l i£Af 



Since LpT is a differentiable concave function, we have (see 
[?]) 



B. Asymptotic Optimality of AVR 

The next result establishes the asymptotic optimality of 
AVR, i.e., if we run AVR for long enough period, the dif- + \/tpT ((r*)]^.^-) • — (r*)]^. 
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where '•' denotes the dot product. Hence, we have 

ieAf \ t=i ) 



E 

t=i 

T 



T 



c*(r*W) 



5Z 5Z 

t=l ieM 

EEK"w-^*w) 

t=l iGA^ 

T 

2(£7n'(Var^(«).r)) 
T 



T = l / 



T 



Now, since and (7* : i e J^)i.rp satisfy (16), (17) 

and (18), we have 



^ E f ^ E t^/" « W) - (Var^ HOut))] 



(22) 



T 

+EE 



rl{t)-rt{t) 
T 



-2 (Var^ m),.,r)) U{t) ~ l-j^ 



T = l 



+2 (Ur) m~l)){r*{t)^m,{t-l))). 

From Lemma 9 (a)-(c) and the continuity of the functions 
involved, we can conclude that the following term (appearing 
above) can be made as small as desired by choosing large 
enough T and then choosing a large enough t: 

((t/f)'(r*(0)-(C/f)'(r*(t)) 

-2 (Ur)' (Var^ {ir*),.,^)) ( r* (t) - ^*(^) ) 



T = l 



+2 (C/^)'(^;,(t-l))(r*(t)-a.(i-l))). 
Also, |rf (t) — r*{t)\ < Tniax- Hencc, taking limits in (22), 

^lim i(0T((r*),^^)-0T((r^),^^))>O. 
holds for almost all sample paths. From optimahty of (r^) ^ ^p, 

0T((r^)i^^)></>T((r*)i^^). 
From the above two inequahties, the result follows. ■ 



VI. Conclusions 

The two main contributions of this work are summarized 
below: 

(1) We propose a novel framework for reward allocation to 
users who are sensitive to temporal variability in the reward 
allocation. The formulation allows tradeoffs between mean and 
variability associated with the reward allocation of the users 
by appropriately choosing the functions {U.P , UY\^j^- 

(2) We proposed an asymptotically optimal online algorithm 
AVR to solve problems falling in this framework. 

Appendix A 
Proof of Lemma 4 

For each c G C, by choosing r* (m'^, v'^, c) = r'^ (c), 
/.t* = and 7* = ^^J^ for all i € TV, we can verify 

that r* (m'^, ,c) along with ^i* and (7* : i G N) satisfy 

(16)-(18) using the fact that (r'^ (c) : c e C), [y.'^ (c) : c S C) 
and ((7f (c)),g^ : c e C) satisfy (9)-(ll). 

Part (b) follows from the definitions of m'^ and v'^. 

Appendix B 
Proof of Lemma 5 

For each c G C, r*(mi,vi,c) is an optimal solution 
to OPTAVR and thus, there exist (like those in KKT- 
OPTAVR given in (16)-(18)) non-negative constants ^* (c) and 
(7*i (c) : i e M) such that for all i e M, 

{U^Ur:{c))^{Ur) {vu){r*{c)-ml,) 

+^l,-^Jil{c)c,{v*{c)) = 0, 

lil{c)c{v*{c)) = 0, 

lu {rt (c) - n„i„) = 0, 

where we used r* (c) instead of r* (mi, vi,c) for brevity. 

For each i £ Nvu due to linearity we have that {UY) (■) 
is a constant, and hence is independent of its argument. Thus, 
we have (C/^)' [vu) = {uY) (Var(r* {C"))). Further, note 
that (mi,Vi) S W and hence satisfies (19)-(20). Using these 
arguments, we can rewrite the above equations as follows: for 
all c e C 

(t/f )' {r* (c)) - {uY) (Var- (r* {C^))) (r* (c) 

-i?[r*(C-)])+7*,-Ml(c)c:(r*(c)) = 0, 
^il{c)cir*ic)) = 0, 
lu {r* (c) - n„i„) = 0, 

Now for each c e C, multiply the above equations 
with 7r(c) and one obtains KKT-OPTSTAT ((9)-(ll)) with 
(tt (c) ^l\ (c) : c e C) and ((tt (c) 71*, {c))^^j^ : c G C) as La- 
grange multipliers. From Lemma 2 (a), OPTSTAT satis- 
fies Slater's condition and hence KKT conditions are suf- 
ficient for optimality of OPTSTAT. Thus, we have that 
(r* (mi, vi, c))^gg is an optimal solution to OPTSTAT. This 
proves part (a). 

Now suppose that (mi, vi) , (m2, V2) G %* , and suppose 
that for some cq G C and i G N, r*(mi,vi,co) 7^ 
r* (m2, V2, Co). Thus, using this together with part (a), we 
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have that (r* (mi, vi, c))^^^ and (r* (m2, V2, c))^^,^ are two 
distinct solutions to OPTSTAT. However, this contradicts fact 
that OPTSTAT has a unique solution (see Lemma 2(b)). Thus, 
(b) has to hold. 

Now suppose that (mi, Vi) , (m2, V2) € H*. and that (c) 
does not hold. Then, we can conclude that atleast one of 
the conditions given in part (c) does not hold. For instance, 
suppose that vij 7^ V2j for some j S A/Vn- This along with the 
fact that (mi, vi) , (m2, V2) G Ji* (and thus they satisfy (20)) 
implies that Var (r* (mi, vi, C"^)) ^ Var (r* (m2, V2, C"^)). 
Thus, we can conclude that for some cq G C and i E Af, 
r*(mi,Vi,co) 7^ r* (m2, V2, Co). We can reach the same 
conclusion if any of the conditions given in (c) are violated. 
But, the conclusion contradicts part (b). Thus, (c) has to hold. 

Part (d) follows from part (c) and Lemma 4 part (b). 



Appendix C 
Proving Lemma 6 

We let 9"" = (m'^,e'^,v'^), and 6» 
consider the Lyapunov function V (9) ~ 

E[h{9,C'')]. 



= (m, v), and 

E[h{9'',C'')] - 



Then 

dt 



VVi9it)) 



d9{t) 
' dt 



= VV{9it)).{gi9{t)) + z{9m 
= \/V{9{t)).g{9{t)), 

where the last step follows from Lemma ??. Let V (9) = 
W {9) .g (9). Then from Lemma 3 (d) and Lemma ??, we 
have that for any 9 E H, 



(v,){E[r*{9,Cn]^mif 



-E 



{r*{9,Cn~m,y 



V{9) = -Y.2{UY) 

+ E 

The expression above is the negative of a sum of (positive) 
weighted squares. Hence, 

F(0) < V 9 en. (23) 

Now, let nv = G H : t>(0) = 0|. Since V{.) is a 
continuously differentiable function on the (compact) set 7i 
satisfying (23), we can use LaSalle's Theorem (see Theorem 
4.4 in [?]) to conclude that 9(t) converges to the largest 
invariant set in Hv- Let H-k denote the set. 

In the remaining part of the proof, we prove that H-^ C 
from which the main claim follows. 

Noting that V {9) = for any 9 G Htt, and using the 
expression for V () given above, we can show that 

E [r* (6», C"')] = m ^ 9 cU^. (24) 

Also, for any 9 G Htt, V {9) ~ 0, and hence using the fact 

that min,,f:[n „ 1 {pY) [v) < 0, we have that 



iue[0/u„ 
E 



{9,C'^) 



V I G ^^^ 



Vn- 



From the above conclusion and (24), we can conclude that for 
any 9 G Ht,, we have 9 G V.*. Since Ji* C "H*, we have 
that for C "H*. Now, since 9{t) converges to 1-1^^, we can 
conclude that 9{t) converges to "H, and the result follows. 



Appendix D 
Proof for Lemma 8 

For any (m,v) G H*, m = and from Lemma 7, 9{t) 
converges to H* . Hence (a) holds. 

To show (b), pick some c G C, and note that r* {9, c) is a 
uniformly continuous function of on H (uniform continuity 
follows from the continuity of r* {9, c) proved in Lemma 3 (a), 
and compactness of H). Hence, for any e > 0, we can find a 
S>Q such that for any 9 e H, d (^r* {9, c) ,r* (9' ,c^^ <e 

for any 9 ^ H such that d (^9,9^ < S. Here d denotes 
the Euclidean distance metric for M.^^ . In particular, for 
any 9' G H*, d (r* {9,c) ,r* (9 ,A) < e for any 9 G 



Ti. such that d y9, 9 j < 5. From the definition of "H*, 

r- (O^cj'j = (r* (6»'',c))^gc since 6»' G "H*. Thus, we 
have that d (r* {9, c) , r* (S'", c)) < e for any 9 eH such that 
d (9,9 ] < S. From Lemma 7, we have that ( 9t ) converges 



to the set "H*. Hence, for a sufficiently large t, d y9t, 9 j < S 

for some 9 G Ji*, and thus d (r* (§t,cj , r* {9'',c)j < e. 
Thus, part (b) holds. 

Parts (c) and (d) can be proved using a similar approach 
as above by using the following facts: (i) 9{t) converges to 

n,; (ii) (UY)' (v,) = (UY)' (vj) for any (m,v) G H,; and 
(iii) For each i G Af, ipY) (•) is uniformly continuous on 

[0,fmax]- 

Appendix E 
Proof of Lemma 9 

Consider any realization (ct)j of (Ct)^. For any c G C, using 
Lemma 8 (b) and the ergodicity of (Ct)^, we have 

T T 



= 7:{c)r*i9\c) 



Since, r* (§t,ct^ = J2cechct=c}^* {^t,c \ and C is a finite 
set, we can use the above result to concluae that 



i=l 



\mi -f^r* (dt^ct) = lim -VVl 

■ ■ t=l cGC 

= y lim -Yi 

cec t=i 

= Y,7r{c)r*{9\c 

cec 

= ^7r(c)r-(c) 



(ct=c)I 



(c,=c)r* [9t,c 



cec 



This proves part (a). 

Using the ergodicity of (Ct)j, parts (b) can be proved using 
a similar approach (as above) by using part (c) of Lemma 8. 



